System and method for visual analysis of on-image gestures

ABSTRACT

A method and system for providing at least a link to a content item related to a multimedia content element respective of an on-image gesture. The method comprises receiving, from a user device, at least on-image gesture and the multimedia content element; analyzing the at least on-image gesture determine at least one portion of the multimedia content element that a user is interested in; generating at least one signature for each of the at least a portion; determining a content item corresponding to the at least one identified portion of multimedia content, wherein the determination is based in part on a type of the at least on-image gesture; and modifying the received multimedia content element to include at least a link to an informative resource containing the content item.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.14/168,811

TECHNICAL FIELD

The present invention relates generally to the analysis of multimediacontent, and more specifically to a system for providing content andlinks to content displayed as part of a web-page.

BACKGROUND

Web-pages are information resources that are suitable for the World WideWeb (WWW) and can be accessed through a web browser. Web-pages typicallycontain text and multimedia content elements that are intended fordisplay on a user's display device. Multimedia content elements aregenerally displayed using portions of code written in, for example,hyper-text mark-up language (HTML) or JavaScript that is inserted into,or otherwise called up by documents also written in HTML and which aresent to a user node for display.

Multimedia content elements displayed in such web-pages are usuallynon-interactive, thereby allowing users to view the multimedia contentelements, but not to connect with such multimedia content. At most, theuser is enabled to leave some feedback regarding the multimedia contentwithin the web-page. Therefore, if a user wishes to receive informationregarding an item viewed in, for example, a video, further searchefforts are required.

SUMMARY

Certain embodiments disclosed herein include a method and system forproviding at least a link to a content item related to a multimediacontent element respective of an on-image gesture. The method comprisesreceiving, from a user device, at least on-image gesture and themultimedia content element; analyzing the at least on-image gesturedetermine at least one portion of the multimedia content element that auser is interested in; generating at least one signature for each of theat least a portion; determining a content item corresponding to the atleast one identified portion of multimedia content, wherein thedetermination is based in part on a type of the at least on-imagegesture; and modifying the received multimedia content element toinclude at least a link to an informative resource containing thecontent item.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter disclosed herein is particularly pointed out anddistinctly claimed in the claims at the conclusion of the specification.The foregoing and other objects, features, and advantages of theinvention will be apparent from the following detailed description takenin conjunction with the accompanying drawings.

FIG. 1 is a schematic block diagram of a network system utilized todescribe the various embodiments.

FIG. 2 is a flowchart describing a process of matching an advertisementto multimedia content displayed on a web-page according to anembodiment.

FIG. 3 is a block diagram depicting the basic flow of information in thesignature generator system.

FIG. 4 is a diagram showing the flow of patches generation, responsevector generation, and signature generation in a large-scalespeech-to-text system.

FIG. 5 is a flowchart describing a process for adding a link tomultimedia content displayed on a web-page.

FIG. 6 is a flowchart describing a process for analyzing an on-imagegesture received by a user according to an embodiment.

DETAILED DESCRIPTION

It is important to note that the embodiments disclosed herein are onlyexamples of the many advantageous uses of the innovative teachingsherein. In general, statements made in the specification of the presentapplication do not necessarily limit any of the various claimedinventions. Moreover, some statements may apply to some inventivefeatures but not to others. In general, unless otherwise indicated,singular elements may be in plural and vice versa with no loss ofgenerality. In the drawings, like numerals refer to like parts throughseveral views.

FIG. 1 shows an exemplary and non-limiting schematic diagram of anetwork system 100 utilized to describe the disclosed embodiments. Anetwork 110 is used to communicate between different parts of thesystem. The network 110 may be the Internet, the world-wide-web (WWW), alocal area network (LAN), a wide area network (WAN), a metro areanetwork (MAN), and other networks capable of enabling communicationbetween the elements of the system 100.

Further connected to the network 110 are one or more clientapplications, such as web browsers (WB) 120-1 through 120-n(collectively referred to hereinafter as web browsers 120 orindividually as a web browser 120, merely for simplicity purposes). Aweb browser 120 is executed over a computing device including, forexample, a personal computer (PC), a personal digital assistant (PDA), amobile phone, a smart phone, a tablet computer, a wearable computingdevice, and other kinds of wired and mobile appliances, equipped withbrowsing, viewing, listening, filtering, and managing capabilities,etc., that are enabled as further discussed herein below. Each of theweb-servers 120 may be implemented as an independent or plug-inapplication.

A server 130 is further connected to the network 110 and is configuredto perform, in part, the embodiments disclosed herein. A request of theserver 130 to analyze the multimedia content item can be sent by ascript executed by a web-browser 120 in the web-page in response to theuploading of one or more multimedia content items to the web-page. Sucha request may include a URL of the web-page or a copy of the web-page.The system 100 also includes a signature generator system (SGS) 140. Inone embodiment, the SGS 140 is connected to the server 130. The server130 is enabled to receive and serve multimedia content and causes theSGS 140 to generate a signature respective of the multimedia content.The process for generating the signatures for multimedia content isexplained in more detail herein below with respect to FIGS. 3 and 4.

It should be noted that each of the server 130 and the SGS 140 typicallycomprises a processing unit (not shown) such as a processor, a CPU, andthe like, that is coupled to a memory. The memory contains instructionsthat can be executed by the processing unit. The server 130 alsoincludes an interface (not shown) to the network 110. In one embodimentthe server 130 is communicatively connected or includes an array ofComputational Cores configured as discussed in more detail below

A plurality of web servers 150-1 through 150-m are also connected to thenetwork 110, each of which is configured to generate and send multimediacontent items to the server 130. The web servers 150-1 through 150-mtypically, but not necessarily exclusively, are resources forinformation that can be associated with a multimedia content sent from aweb browser 120. For example, a web server 150-1 may host the Wikipediawebsite.

The system 100 may be configured to generate customized channels ofmultimedia content. Accordingly, a web browser 120 or a client channelmanager application (not shown), available on either the server 130, orthe web browser 120, or as an independent or plug-in application, mayenable a user to create customized channels of multimedia content byreceiving selections made by a user as inputs. Such customized channelsof multimedia content are personalized content channels that aregenerated in response to selections made by a user of the web browser120 or the client channel manager application. The system 100, and inparticular the server 130 in conjunction with the SGS 140, determineswhich multimedia content is more suitable to be viewed, played, orotherwise utilized by the user with respect to a given channel, based onthe signatures of selected multimedia content. These channels mayoptionally be shared with other users, used and/or further developedcooperatively, and/or sold to other users or providers, and so on. Theprocess for defining, generating, and customizing the channels ofmultimedia content are described in greater detail in the co-pendingSer. No. 13/344,400 application referenced above.

According to the embodiments disclosed herein, the server 130 isconfigured to carry out a process for providing a content item or a linkthereto to an information resource associated with an input multimediacontent element respective of on-image gesture, event, or combinationthereof. The on-image gesture and/or event are received from aweb-browser 120. In response, the server 130 returns a modified web pageincluding the multimedia content element with the determined contentitem or linked thereto.

The on-image gesture or combination of gestures may include, but are notlimited to: one or more touch gestures, one or more scrolls over the atleast a portion of the multimedia content element, one or more clicksover the at least a portion of the multimedia content, one or moreresponses to the at least a portion of the multimedia content, acombination thereof, a portion thereof, and so. The touch gestures maybe related to computing devices with a touch screen display and suchgestures include, but are not limited to, tapping on a content element,resizing a content element, swiping over a content element, changing thedisplay orientation, and so on. In an embodiment, gestures detected bythe web-browser can be sent in combination with one or more events. Theevent or combination of events may include, but are not limited to, apredetermined period of time in which a user views or interacts with themultimedia content element.

The server 130 is further configured to analyze the received on-imagegestures and/or events to determine at least one portion of the receivedmultimedia content element that is of particular interest to the user.Then, the server 130 by means of the SGS 140 is configured to generate asignature for each identified portion. Using the generated signaturesand the type of the received on-image gesture and/or event, a search forcontent items relevant to the identified portion is performed.Thereafter, relevant content items, or links thereto, can be added as anoverlay to the received multimedia content element displayed on aweb-page.

A multimedia content element and content item may include, for example,an image, a graphic, a video stream, a video clip, an audio stream, anaudio clip, a video frame, a photograph, and an image of signals (e.g.,spectrograms, phasograms, scalograms, etc.), and/or combinations thereofand portions thereof.

It should be noted that the server 130 may analyze all or a sub-set ofthe multimedia content elements contained in the web-page. The SGS 140generates at least one signature for portions of each multimedia contentelement provided by the server 130. The generated signature(s) may berobust to noise and distribution as discussed below. Then, using thegenerated signature(s), the server 130 is capable of matching thesignature of a web-page accessible by a link to the multimedia contentand providing the matched link. Such links may be extracted from thedata warehouse 160. For example, if the signature of an image indicatesthe city of New York, then a link to the municipal website of the cityof New York may be determined.

For instance, in order to provide a matching content item for a sportscar it may be desirable to locate a car of a particular model. However,in most cases, the model of the car would not be part of the metadataassociated with the multimedia content (image). Moreover, the car shownin an image may be displayed at an angle that differs from the angle ofa specific photograph of the car that is available for use as a searchitem. The signature generated for that image would enable accuraterecognition of the model of the car because the signatures generated forthe multimedia content elements, according to the disclosed embodiments,allow for recognition and classification of multimedia elements, such asby content-tracking, video filtering, multimedia taxonomy generation,video fingerprinting, speech-to-text, audio classification, elementrecognition, video/image search and any other application requiringcontent-based signatures generation and matching for large contentvolumes such as web and other large-scale databases.

In one embodiment, the signatures generated for more than one multimediacontent element are clustered. The clustered signatures are used tosearch for matching content items and to select one or more of thematching content items. The one or more selected matching content itemsare retrieved from the data warehouse 160 and uploaded to the web-pageon the web browser 120 by means of one of the web servers 150.

FIG. 2 depicts an exemplary and non-limiting flowchart 200 describingthe process of matching an advertisement to a multimedia content elementdisplayed on a web-page. In S205, the method starts when a web-page isuploaded to one of the web-browsers (e.g., web-browser 120-1). In S210,a request to match at least one multimedia content element contained inthe uploaded web-page to an appropriate content item is received. Therequest can be received from a web server (e.g., a server 150-1), ascript running on the uploaded web-page, or an agent (e.g., an add-on)installed in the web-browser. S210 can also include extracting themultimedia content elements and requesting that respective signatures begenerated.

In S220, a signature of the multimedia content element is generated. Thesignature for the multimedia content element generated by a signaturegenerator is described below. In S230, an advertisement item is matchedto the multimedia content element respective of its generated signature.In one embodiment, the matching process includes searching for at leastone advertisement item with a matching signature respective of thesignature of the multimedia content and displaying the at least oneadvertisement item within the display area of the web-page. In oneembodiment, the matching of an advertisement to a multimedia contentelement can be performed by the computational cores that are part of alarge scale matching discussed in detail below.

In S240, upon a user's gesture, the matched advertisement item isuploaded to the web-page and displayed therein. The user's gesture maybe: a scroll on the multimedia content element; a tap on the multimediacontent element, and/or a response to the multimedia content. Thisensures that the user attention is given to the content item byproviding the advertised content only when the user has becomeinterested in the multimedia content element. In S250 it is checkedwhether there are additional requests to analyze multimedia contentelements and, if so, execution continues with S210; otherwise, executionterminates.

As a non-limiting example, a user uploads a web-page that contains animage of a sea shore. The image is then analyzed and a signature isgenerated respective thereto. Respective of the image signature, anadvertisement item (e.g., a banner) is matched to the image, forexample, a swimsuit advertisement. Upon detection of a user's gesture,for example, a mouse scrolling over the sea shore image, the swimsuit adis displayed.

The web-page may contain a number of multimedia content elements;however, in some instances only a few advertisement items may bedisplayed in the web-page. Accordingly, in one embodiment, thesignatures generated for the multimedia content elements are clusteredand the cluster of signatures is matched to one or more advertisementitems.

FIGS. 3 and 4 illustrate the generation of signatures for the multimediacontent elements by the SGS 140 according to one embodiment. Anexemplary high-level description of the process for large scale matchingis depicted in FIG. 3. In this example, the matching is for a videocontent.

Video content segments 2 from a Master database (DB) 6 and a Target DB 1are processed in parallel by a large number of independent computationalCores 3 that constitute an architecture for generating the Signatures(hereinafter the “Architecture”). Further details on the computationalCores generation are provided below. The independent Cores 3 generate adatabase of Robust Signatures and Signatures 4 for Targetcontent-segments 5 and a database of Robust Signatures and Signatures 7for Master content-segments 8. An exemplary and non-limiting process ofsignature generation for an audio component is shown in detail in FIG.4. Finally, Target Robust Signatures and/or Signatures are effectivelymatched, by a matching algorithm 9, to Master Robust Signatures and/orSignatures database to find all matches between the two databases.

To demonstrate an example of signature generation process, it isassumed, merely for the sake of simplicity and without limitation on thegenerality of the disclosed embodiments, that the signatures are basedon a single frame, leading to certain simplification of thecomputational cores generation. The Matching System is extensible forsignatures generation capturing the dynamics in-between the frames.

The Signatures' generation process will now be described with referenceto FIG. 4. The first step in the process of signatures generation from agiven speech-segment is to breakdown the speech-segment to K patches 14of random length P and random position within the speech segment 12. Thebreakdown is performed by the patch generator component 21. The value ofthe number of patches K, random length P and random position parametersis determined based on optimization, considering the tradeoff betweenaccuracy rate and the number of fast matches required in the flowprocess of the server 130 and SGS 140. Thereafter, all the K patches areinjected in parallel into all computational Cores 3 to generate Kresponse vectors 22, which are fed into a signature generator system 23to produce a database of Robust Signatures and Signatures 4.

In order to generate Robust Signatures, i.e., Signatures that are robustto additive noise L (where L is an integer equal to or greater than 1)by the Computational Cores 3 a frame ‘i’ is injected into all the Cores3. Then, Cores 3 generate two binary response vectors: {right arrow over(S)} which is a Signature vector, and {right arrow over (RS)} which is aRobust Signature vector.

For generation of signatures robust to additive noise, such asWhite-Gaussian-Noise, scratch, etc., but not robust to distortions, suchas crop, shift and rotation, etc., a core C_(i)={n_(i)} (1≤i≤L) mayconsist of a single leaky integrate-to-threshold unit (LTU) node or morenodes. The node n_(i) equations are:

$V_{i} = {\sum\limits_{j}{w_{ij}k_{j}}}$ n_(i) =  ⊐ (Vi − Th_(x))

where, □ is a Heaviside step function; w_(ij) is a coupling node unit(CNU) between node i and image component j (for example, grayscale valueof a certain pixel j); k_(j) is an image component ‘j’ (for example,grayscale value of a certain pixel j); Thx is a constant Thresholdvalue, where x is ‘S’ for Signature and ‘RS’ for Robust Signature; andVi is a Coupling Node Value.

The Threshold values Thx are set differently for Signature generationand for Robust Signature generation. For example, for a certaindistribution of Vi values (for the set of nodes), the thresholds forSignature (Th_(S)) and Robust Signature (Th_(RS)) are set apart, afteroptimization, according to at least one or more of the followingcriteria:

1: For:

V_(i)>Th_(RS)

1−p(V>Th_(S))−1−(1−ε)¹<<1

i.e., given that /nodes (cores) constitute a Robust Signature of acertain image I, the probability that not all of these I nodes willbelong to the Signature of same, but noisy image, {tilde over (.)} issufficiently low (according to a system's specified accuracy).

2:

p(V _(i)>Th_(RS))≈l/L

i.e., approximately l out of the total L nodes can be found to generatea Robust Signature according to the above definition.

3: Both Robust Signature and Signature are Generated for Certain Framei.

It should be understood that the generation of a signature isunidirectional, and typically yields lossless compression, where thecharacteristics of the compressed data are maintained but theuncompressed data cannot be reconstructed. Therefore, a signature can beused for the purpose of comparison to another signature without the needof comparison to the original data. Detailed description of theSignature generation can be found U.S. Pat. Nos. 8,326,775 and8,312,031, assigned to common assignee, which are hereby incorporated byreference for all the useful information they contain.

A Computational Core generation is a process of definition, selection,and tuning of the parameters of the cores for a certain realization in aspecific system and application. The process is based on several designconsiderations, such as:

-   -   (a) The Cores should be designed so as to obtain maximal        independence, i.e., the projection from a signal space should        generate a maximal pair-wise distance between any two cores'        projections into a high-dimensional space.    -   (b) The Cores should be optimally designed for the type of        signals, i.e., the Cores should be maximally sensitive to the        spatio-temporal structure of the injected signal, for example,        and in particular, sensitive to local correlations in time and        space. Thus, in some cases a core represents a dynamic system,        such as in state space, phase space, edge of chaos, etc., which        is uniquely used herein to exploit their maximal computational        power.    -   (c) The Cores should be optimally designed with regard to        invariance to a set of signal distortions, of interest in        relevant applications. Detailed description of the Computational        Core generation, the computational architecture, and the process        for configuring such cores is discussed in more detail in the        co-pending U.S. patent application Ser. No. 12/084,150        referenced above.

FIG. 5 depicts an exemplary and non-limiting flowchart 500 describingthe process of adding an overlay to multimedia content displayed on aweb-page. In S510, the method starts when a web-page is uploaded to aweb-browser (e.g., web-browser 120-1). In another embodiment, the methodstarts when a web-server (e.g., web-server 150-1) receives a request tohost the requested web-page. In S515, the server 130 receives theuniform resource locator (URL) of the uploaded web-page. In anotherembodiment, the uploaded web-page includes an embedded script. Thescript extracts the URL of the web-page, and sends the URL to the server130. In another embodiment, an add-on installed in the web-browser 120extracts the URL of the uploaded web-page, and sends the URL to theserver 130. In yet another embodiment, an agent is installed on a userdevice executing the web browser 120. The agent is configured to monitorweb-pages uploaded to the web-site to determine when web-pages have beenuploaded to the web-site, extract the URLs, and send the URLs to theserver 130. In another embodiment, a web-server (e.g., server 150)hosting the requested web-page, provides the server 130 with the URL ofthe requested web-page. It should be noted only URLs of selected websites can be sent to the server 130, for example, URLs related toweb-sites that paid for the additional information.

In S520, the server downloads the web-page respective of each receivedURL. In S525, the server 130 analyzes the web-page in order to identifythe existence of at least one or more multimedia content elements in theuploaded web-page. It should be understood that a multimedia content,such as an image or a video, may include a plurality of multimediacontent elements. In S530, the SGS 140 generates at least one signaturefor each multimedia content element identified by the server 130. Thesignatures for the multimedia elements are generated as described ingreater detail above.

In S540, respective of each signature, the server 130 determines one ormore links to content that exists on a web server, for example, each ofthe web servers 150-1 through 150-m that can be associated with themultimedia element. A link may be a hyperlink, a URL, and the like. Thecontent accessed through the link may be, for example, informativeweb-pages such as a Wikipedia® article. The determination of the linkmay be made by identification of the context of the signatures generatedby the server 130. For example, if a multimedia content element wasidentified as a football player, a signature is generated respectivethereto, and a link to a sport website that contains information aboutthe football player is determined. In S550, the determined link to thecontent is added as an overlay to the web-page by the server 130,respective of the corresponding multimedia content element. According toone embodiment, a link that contains the overlay may be provided to aweb browser respective of a user's gesture. A user's gesture may be, forexample, a click on the multimedia content element through, for example,a computer mouse, a touch pad, or a touch screen; and/or a response tothe multimedia content (e.g., movement detected by a motion sensor,noise detected by a microphone, etc.).

The modified web-page that includes at least one multimedia element withthe added link can be sent directly to the web browser (e.g., browser120-1) requesting the web-page. This requires establishing a datasession between the server 130 and the web browsers 120. In anotherembodiment, the multimedia element including the added link is returnedto a web server (e.g., server 150-1) hosting the requested web-page. Theweb server (e.g., server 150-1) subsequently returns the requestedweb-page with the multimedia element containing the added link to theweb browser (e.g., browser 120-1) requesting the web-page. Once the“modified” web page is displayed over the web browser, a detected eventor user's gesture with respect to the multimedia content element wouldcause the browser to upload the content (e.g., a Wikipedia® article webpage) addressed by the link added to the multimedia element.

In S560, it is checked whether the one or more multimedia contentelements contained in the web-page has changed, and if so, executioncontinues with S525; otherwise, execution terminates.

Different portions of the multimedia content element may be associatedwith different server content or links to server content. As anon-limiting example, a web-page related to cinema is uploaded and animage of the movie “Pretty Woman” showing actor Richard Gere and actressJulia Roberts is identified within the web-page by the server 130. Asignature is generated by the SGS 140 respective of the actor RichardGere and the actress Julia Roberts, both shown as portions of the image.A link to Richard Gere's biography on the Wikipedia® website and a linkto Julia Roberts' biography on the Wikipedia® website are thendetermined respective of the signatures and the context of thesignatures as further described herein above. The context of thesignatures according to this example may be “American Movie Actors.”

An overlay containing the links to Richard Gere's biography on theWikipedia® website and Julia Roberts' biography on the Wikipedia®website is added over the image such that upon detection of a specifiedevent or a user's gesture, for example, a gesture wherein a mouse clickson the part of the image where Richard Gere is shown, the link toRichard Gere's biography on Wikipedia® is provided to the user.

According to another embodiment, a request for a URL of a web-page thatcontains an embedded video clip is received. The video content withinthe requested web-page is analyzed and a signature is generatedrespective of the entertainer Madonna that is shown in the videocontent. A link to Madonna's official web-page hosted on a web-server150-n is then determined respective of the signature as furtherdescribed herein above. An overlay containing the link to Madonna'sofficial web-page is then added over the video content. The web-pagetogether with the link to Madonna's official web-page is then sent tothe web server 150-1. Then, the requested web-page with the modifiedvideo element is uploaded to the web-browser 120-1.

The web-page may contain a number of multimedia content elements;however, in some instances only a few links may be displayed in theweb-page. Accordingly, in one embodiment, the signatures generated forthe multimedia content elements are clustered and the cluster ofsignatures is matched to one or more content items.

FIG. 6 depicts an exemplary and non-limiting flowchart 600 describing amethod of analyzing an on-image gesture received by a user device andproviding a content item respective thereof according to an embodiment.The method can be performed by the server 130 using the SGS 140.

In S610, the method starts when at least a portion of a multimediacontent element from a web-page as well as at least one gesture, event,or combination thereof, is received. The on-image gestures and themultimedia content are captured and sent by a web-browser (e.g., WB120-1) executed over a user device. In an embodiment, a URL of theweb-page and an identifier of the multimedia content associated with thedetected gesture and/or event is provided. On-image gestures mayinclude, but are not limited to: one or more touch gestures, one or morescrolls over the at least a portion of the multimedia content element,one or more clicks over the at least a portion of the multimediacontent, one or more responses to the at least a portion of themultimedia content, a combination thereof, a portion thereof, and so on.The touch gestures may be related to computing devices with a touchscreen display and such gestures may include, but are not limited to,tapping on a content element, resizing a content element, swiping over acontent element, changing the display orientation, and so on. In anembodiment gestures detected by the web-browser can be sent togetherwith one or more events. Alternatively, the web-browser 120 can sendonly events related to the interaction of a user with the contentelement. Events may include, but are not limited to, a predeterminedperiod of time in which a user views or interacts with the multimediacontent element.

In S620, the received gestures and/or events are analyzed to determineat least one portion of the received multimedia content element that isof particular interest to the user. As a non-limiting example, if amultimedia content element is an image featuring a man and a boat, andthe user zooms in on the boat (an event of expanding a part of a screenthat demonstrates an interest in the particular portion of the imagethat is expanded), the boat is determined to be the portion of themultimedia content element that is of particular interest to the user.

In S630, at least one signature is generated for each portion of themultimedia content element identified in S620. The signatures for themultimedia content elements are generated as described in greater detailabove.

In S640, respective of the at least a signature of at each portion ofthe multimedia content element, the received on-image gestures and/orevents corresponding to the at least a portion of the multimedia contentelement are determined. Each different gesture, event, set of gestures,set of events, and combinations thereof, received from a user can bedifferentiated and associated with different links or content from aserver. As an example, a click on the at least a portion of themultimedia content may be determined as a first gesture associated with,e.g., a link to a Wikipedia® article, and a double click on the at leasta portion of the multimedia content may be determined as a differentgesture associated with push data being delivered to the user. In anembodiment, a preconfigured table providing a mapping between a type ofgesture, event, and a combination of gesture and event to the type ofcontent item and its delivery method is saved in the data warehouse 160and is accessible by the server 130. Furthermore, one of ordinary skillshould appreciate that an on-image gesture can be a graphic, a videostream, a video clip, an audio stream, an audio clip, a video frame, anda photograph.

In S650, respective of each signature for the portion of the multimediacontent element and corresponding gestures and/or events, a search isperformed for content items that can be associated with the multimediaelement respective of the gestures and/or events. This determination maybe performed by matching signatures generated for the portion of themultimedia content element with potential content items. The search forsuch content items is performed using a data warehouse 160 by the webservers 150. A content item is determined to be related to the portionof multimedia content element when their respective signatures (asgenerated by the SGS 140) match. The signature matching process isdescribed in more detail above. In an exemplary embodiment, when twosignatures overlap more than a predetermined threshold level, forexample 60% of the signature match, these signatures may be consideredas matching.

In an embodiment, the search for relevant content items is not limitedto the data warehouse. The search can be performed using signaturesgenerated by the SGS 140 and the identified context in data sources thatindex searchable content including, but not limited to, multimediacontent items using signatures and concepts. A context is determined asthe correlation between a plurality of concepts. An example for suchindexing techniques using signatures is disclosed in a co-pending U.S.patent application Ser. No. 13/766,463, filed Feb. 13, 2013, entitled “ASYSTEM AND METHODS FOR GENERATION OF A CONCEPT BASED DATABASE”, assignedto common assignee, and is hereby incorporated by reference for all theuseful information it contains.

In one embodiment, the signatures generated for more than oneunstructured data element are clustered. The clustered signatures areused to search for a common concept. The concept is a collection ofsignatures representing elements of the unstructured data and metadatadescribing the concept. As a non-limiting example, a ‘Superman concept’is a signature reduced cluster of signatures describing elements (suchas multimedia elements) related to, e.g., a Superman cartoon: a set ofmetadata representing proving textual representation of the Supermanconcept. Techniques for generating concepts and concept structures arealso described in the co-pending U.S. patent application Ser. No.12/603,123 (hereinafter the '123 Application) to Raichelgauz et al.,which is assigned to common assignee, and is incorporated hereby byreference for all that it contains.

In S660, the determined related content or a link to the determinedcontent is added as an overlay to the web-page respective of thecorresponding multimedia content element and the corresponding gesturesand/or events. According to one embodiment (not shown), a vocabulary ofthe determined gestures and/or events may be provided as part of theoverlay. Such vocabulary may include, but is not limited to, one or moregestures and/or events, and a description of the corresponding servercontent or links to server content that will be provided upon occurrenceof the one or more gestures and/or events.

In an embodiment, the modified web-page that includes at least onemultimedia element with the added link can be sent directly to the webbrowser (e.g., browser 120-1) requesting the web-page. This requiresestablishing a data session between the server 130 and the web browsers120. In another embodiment, the multimedia element including the addedlink is returned to a web server (e.g., server 150-1) hosting therequested web-page. The web server (e.g., server 150-1) returns therequested web-page with the multimedia element containing the added linkto the web browser (e.g., browser 120-1) requesting the web-page. Oncethe “modified” web page is displayed over the web browser, a detecteduser's gesture and/or the occurrence of an event over the multimediaelement would cause the browser to upload the content addressed by thelink added to the multimedia element.

In S670, it is checked whether one or more gestures and/or events haveoccurred and, if so, execution continues with S610; otherwise, executionterminates

As another non-limiting example, a touch gesture associated with aquestion mark as per the vocabulary may provide an informative link, anda touch gesture associated with an exclamation mark as per thevocabulary may provide a link in which the user will be able to respondto the image by, e.g., leaving a written comment regarding the image.

As a further non-limiting example, the multimedia content element may bea video clip of a music video of a particular song. Additionally, thevideo clip may have content item related to purchasing the song, and thelink to this server content may be related to the combination of theevent that a user views the video for at least 30 seconds and thegesture of swiping on a touch screen. If a user proceeds to view theclip for one minute then swipes the touch screen, the user will beprovided with a link to a website that would allow the user to purchasethe song.

The various embodiments disclosed herein can be implemented as hardware,firmware, software, or any combination thereof. Moreover, the softwareis preferably implemented as an application program tangibly embodied ona program storage unit or computer readable medium consisting of parts,or of certain devices and/or a combination of devices. The applicationprogram may be uploaded to, and executed by, a machine comprising anysuitable architecture. Preferably, the machine is implemented on acomputer platform having hardware such as one or more central processingunits (“CPUs”), a memory, and input/output interfaces. The computerplatform may also include an operating system and microinstruction code.The various processes and functions described herein may be either partof the microinstruction code or part of the application program, or anycombination thereof, which may be executed by a CPU, whether or not sucha computer or processor is explicitly shown. In addition, various otherperipheral units may be connected to the computer platform such as anadditional data storage unit and a printing unit. Furthermore, anon-transitory computer readable medium is any computer readable mediumexcept for a transitory propagating signal.

All examples and conditional language recited herein are intended forpedagogical purposes to aid the reader in understanding the principlesof the invention and the concepts contributed by the inventor tofurthering the art, and are to be construed as being without limitationto such specifically recited examples and conditions. Moreover, allstatements herein reciting principles, aspects, and embodiments of theinvention, as well as specific examples thereof, are intended toencompass both structural and functional equivalents thereof.Additionally, it is intended that such equivalents include bothcurrently known equivalents as well as equivalents developed in thefuture, i.e., any elements developed that perform the same function,regardless of structure.

1. A method for providing at least one link to a content item related toa multimedia content element in response to an on-image gesture,comprising: receiving, from a user device, at least the on-image gestureand an indication of the multimedia content element, the multimediacontent element being initially non-interactive; analyzing the at leaston-image gesture to identify at least one portion of the multimediacontent element in which a user is interested; generating at least onesignature for each of the identified at least one portion; determining acontent item corresponding to the at least one portion of the multimediacontent element, wherein the determination is based at least on a typeof the at least on-image gesture and the at least one signature; andmodifying the multimedia content element to include at least a link toan informative resource containing the content item, thereby making themultimedia content element interactive.
 2. The method of claim 1,further comprising: receiving at least one event related to themultimedia content element; and wherein determining the content itemcorresponding to the at least one identified portion of multimediacontent is further based on the at least one event.
 3. The method ofclaim 1, wherein an on-image gesture is any one of: a touch gesture, ascroll-over, a mouse click, wherein the touch gesture is detected on auser device having a touch screen display.
 4. The method of claim 1,wherein the at least one event is at least one of viewing the multimediacontent element for a specified period of time and interacting with themultimedia content element for a specified period of time.
 5. The methodof claim 2, further comprising: determining a type of the on-imagegesture and a type of the at least one event; and determining a type ofthe content item based on at least one of: the type of the on-imagegesture and the type of the at least one event.
 6. The method of claim1, further comprising: determining a context of the multimedia contentelement based on the generated signature; and determining the contentitem based on the context of the multimedia content element respectiveof the generated signature.
 7. The method of claim 1, wherein any one ofthe multimedia content element and the content item is at least one of:an image, graphics, a video stream, a video clip, an audio stream, anaudio clip, a video frame, a photograph, images of signals, combinationsthereof, and portions thereof.
 8. The method of claim 1, wherein the atleast one link is added to the multimedia content element as an overlayobject, wherein the modified multimedia content is embedded in a webpage displayed in a web-browser of the user device.
 9. The method ofclaim 8, wherein the overlay object comprises a vocabulary of at leastone on-image gesture determined as corresponding to the at least oneidentified portion of the multimedia content element.
 10. The methodaccording to claim 1 wherein the multimedia content element is an imagethat is included in a web page.
 11. The method according to claim 1wherein the at least one signature represents a response of at least oneneural network to each of the identified at least one portion.
 12. Themethod according to claim 1 comprising generating the at least onesignature by a plurality of mutually independent computational cores.13. The method according to claim 1 wherein the steps of (a) analyzing,(b) generating, (c) determining, and (d) modifying are executed by aserver; and wherein the method comprises sending to the user device, themultimedia content element that includes at least the link to aninformative resource containing the content item.
 14. A non-transitorycomputer readable medium having stored thereon instructions for causingone or more processing units to execute the method according to claim 1.15. A system for providing at least a link to a content item related toa multimedia content element in response to a user gesture, comprising:an interface to a network for receiving a uniform resource locator (URL)of a web-page containing a multimedia content element and at leaston-image gesture related to the multimedia content element; a processor;and a memory coupled to the processor, the memory contains instructionsthat when executed by the processor cause the system to: receive, from auser device, the at least on-image gesture and the multimedia contentelement, the multimedia content element being initially non-interactive;analyze the at least on-image gesture to identify at least one portionof the multimedia content element in which a user is interested;generate at least one signature for each of the identified at least oneportion; determine a content item corresponding to the identified atleast one portion of the multimedia content element, wherein thedetermination is based at least on a type of the at least on-imagegesture and the at least one signature; and modify the multimediacontent element to include at least a link to an informative resourcecontaining the content item, thereby making the multimedia contentelement interactive.
 16. The system of claim 15, wherein the system isfurther configured to: receive at least one event related to themultimedia content element; and the content item corresponding to the atleast one identified portion of multimedia content is determined alsousing the at least one event.
 17. The system of claim 16, wherein theon-image gesture is any one of: a touch gesture, a scroll-over, a mouseclick, wherein the touch gesture is detected on a user device having atouch screen display.
 18. The system of claim 16, wherein the at leastone event is at least one of viewing the multimedia content element fora specified period of time and interacting with the multimedia contentelement for a specified period of time.
 19. The system of claim 16,wherein the system is further configured to: determine a type of theon-image gesture and a type of the at least one event; and determine atype of the content item based on at least one of: the type of theon-image gesture and the type of the at least one event.
 20. The systemof claim 16, wherein the system is further configured to: determine acontext of the multimedia content element based on the generatedsignature; and determine the content item based on the context of themultimedia content element respective of the generated signature. 21.The system of claim 15, wherein any one of the multimedia contentelements and the content item is at least one of: an image, graphics, avideo stream, a video clip, an audio stream, an audio clip, a videoframe, a photograph, images of signals, combinations thereof, andportions thereof.
 22. The system of claim 15, wherein the at least onelink is added to the multimedia content element as an overlay object,wherein the modified multimedia content is embedded in a web pagedisplayed in a web-browser of the user device.
 23. The system of claim22, wherein the overlay object comprises a vocabulary of the at leastone on-image gesture determined as corresponding to the at least oneidentified portion of the multimedia content element.